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Abstract 

We study a generalization of the classical median finding problem to batched query 
case: given an array of unsorted n items and k (not necessarily disjoint) intervals in 
the array, the goal is to determine the median in each of the intervals in the array. 
We give an algorithm that uses 0{n log n + k log k log n) comparisons and show a lower 
jys i bound of ri(nlogfc) comparisons for this problem. This is optimal for k = 0{n/logn). 

q 

o ; 1 Introduction 

The classical median finding problem is to find the median item, that is, the item of rank 
^ ! \n/2\ in an unsorted array of size n. We focus on the comparison model, where items in the 

^ I array can be compared only using comparisons, and we count the number of comparisons 

C^l ' performed by any algorithmu. It is known since the 70's that this problem can be solved using 

^.'- 0{n) comparisons in the worst case |BFP+73] . Later research JBJ85| ISPP76[ IDZ99[ IDZOl] 

Q . showed that the number of comparisons needed for solving the median finding algorithm is 

00 ! between (2 + e)n and 2.95n in the worst case (in the deterministic case). Closing this gap 

^-^ ' for a deterministic algorithm is an open problem, but surprisingly, one can find the median 

using 1.5n + o{n) comparisons using a randomized algorithm [MR95J . 
^ ■ We study the following generalization of the median problem. 

^: 

The A;-range-medians Problem. The input is an unsorted array S with n entries. A 
sequence of k queries Qi, . . . ,Qk is provided. A query Qj = [Ij, Vj] is an interval of the array, 
and the output is xi, . . . , x^, where 



Xj 



median < S[/j], §[lj + 1], . . . , §[rj] > 



for j = 1, . . . ,k. We refer to this as the k-range-medians problem. The problem is to build 
a data-structure for S such that it can answer this kind of queries quickly. Notice that the 
intervals are possibly overlapping. 
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This is the interval version of the classical median finding problem, and it is interesting 
on its own merit. In addition, there are many motivating scenarios where they arise. 

Examples. A motivation arises in analyzing logs of internet advertisements (aka ads). We 
have the log of clicks on ads on the internet: each record gives the time of the click as well 
as the varying price paid by the advertiser for the click, and the log is arranged in time- 
indexed order. Then, S[z] is the price for the ith click. Any given advertiser runs several ad 
campaigns simultaneously spread over different intervals of time. The advertiser then wishes 
to compare his cost to the general ad market during the period his campaigns ran, and a 
typical comparison is to the median price paid for clicks during those time intervals. This 
yields an instance of the fc-range-medians problem, for possibly intersecting set of intervals. 

As another example, consider IP networks where one collects what are known as SNMP 
logs: for each link that connects two routers, one collects the total bytes sent on that link 
in each fixed length duration like say 5 minutes |KMZ03] . Then, S[i] is the number of bytes 
sent on that link in the ith time duration. A traffic analyst is interested in finding the 
median value of the traffic level within a specific time window such as a week, office hours, 
or weekends, or the median within each such time window. Equally, the analyst is sometimes 
interested in median traffic levels during specific external events such as the time duration 
when an attack happened or a new network routing strategy was tested. 

There are other attributes in addition to time where applications may solve range median 
problems. For example, §[i] may be the total value of real estate sold in postal zipcode area i 
arranged in sorted order, and an analyst may be interested in the median value for a borough 
or a city represented by a consecutive set of zipcodes. 

One can ask similar interval versions of other problems too, for example, the median may 
be replaced by (say) the maximum, minimum, mode or even the sum. 

• For sum, a trivial 0{n) preprocessing to compute all the prefix sums P[j] = Xlj<7 §[^1 
suffices to answer any interval query Qj = [lj,rj] in optimal 0(1) time using P[rj] — 

• If the summation operator (i.e., ^) is replaced by a semigroup operator (where the sub- 
traction operator is absent), then S can be preprocessed in 0{nk) space and time and 
each query can be answered in 0{ak{n)) where a^ is a slow growing function |Yao82j . 
and this is optimal under general semigroup conditions |Yao85j . 

• For the special cases of the semigroup operator such as the maximum or minimum, 
a somewhat nontrivial algorithm is needed to get same optimal bounds as for the Y2 
case (see for example |BFC04j ). 

The median operator is not a semigroup operator and presents a more difficult problem. The 
only prior results we know are obtained by using the various tradeoffs shown in [KMS05J . 
For the case when k = 1, the interesting tradeoffs for preprocessing time and query times are 
respectively, roughly, 0(n log^n) and O(logn), or O(n^) and 0(1), or 0(n) and O(n^) for 
constant fraction e [KMSOSj . These bounds for individual queries can be directly applied to 
each of the k interval queries in our problem, resulting in a multiplicative k factor in the query 



complexity. In particular, the work of Krizanc et al. |KMS05j implies an O (n log^ n + k log n) 
time algorithm for our problem. 
Our main result is as follows. 

Theorem 1.1 There is a deterministic algorithm to solve the k-range-medians problem in 
0{nlogk + klogklogn) time. Furthermore, in the comparison model, any algorithm that 
solves this problem requires Q{n\ogk) comparisons. 

The /c-range-medians problem seems to be a fairly basic problem and it is worthwhile to 
have tight bounds for it. In particular, 0(nlog/c) may not be the bound one suspects at 
first glance to be tight for this problem. For k = 0{n/logn), our algorithm is optimal. It 
also improves |KMS05] for k = 0{n). 

The lower bound holds even if the set of intervals is hierarchical, that is, for any two 
intervals in the set, either one of them is contained in the other, or they are disjoint. On the 
other hand, the upper bound holds even if the queries arrive online, in the amortized sense. 
Our algorithm uses relaxed sorting on pieces of the array, where only a subset of items in a 
piece is in their correct sorted location. Relaxed sorting like this has been used before for 
other problems, for example, see |AY89] . 

In the following, the kth element of a set S (or element of rank k) would refer to the kth 
smallest element in the set 5*. For simplicity, we assume the elements of § are all unique. 

2 The Lower Bound 

Recall that § is an unsorted array of n elements. Assume that n is a multiple of k. Let 
\&(ra, k) = <^ i = 1, . . . , k > , ior n > k > 0. We will say an element of S is the ith element 
of S if its rank in S is i. 

Claim 2.1 Any algorithm MedianAlg that computes all the elements of rank in ^['(n, fc) 
from S needs to perform Q{n\ogk) comparisons in the worst case. 

Proof: Let nii = in/k, for i = 0, . . . ,k. An element would be labeled i if it is larger 
than the ?nj_ith element of S and smaller than the ?njth element of S (note, that the m^th 
element of S is the largest element in S). An element would be unlabeled if its rank in S is 
in "^{n, k). 

Note, that the output of the algorithm is the indices of the k unlabeled elements. We 
will argue that just computing these k numbers requires Q{nlogk) time. 

Consider an execution of MedianAlg on S. We consider the comparison tree model, 
where the input travels down the decision tree from the root, at any vertex a comparison is 
being made, the and the input is directed either to the right or left child depending on the 
result of the comparison. 

A labelling (at a vertex v of the decision tree) is consistent with the comparisons seen so 
far by the algorithm if there is an input with this labelling, such that it agrees with all the 
comparisons seen so far and it reaches v during the execution. Let Z be the set of labellings 
of § consistent with the comparisons seen so far at this vertex v. 



We claim that if |Z| > 1 then the algorithm can not yet terminate. Indeed, in such a 
case there are at least two different labellings that are consistent with the comparisons seen 
so far. If not all the labellings of Z have the same set of k elements marked as unlabeled, 
then the algorithm has different output (i.e., the output is just the indices of the unlabeled 
elements), and as such the algorithm can not terminate. 

So, let S[a] be an element that has two different labels in two labellings of Z. There exists 
two distinct inputs B = [bi, . . . ,bn] and C = [ci, . . . , c„] that realizes these two labellings. 
Now consider the input D{t) = [di{t), . . . , dn(t)], where di{t) = 6j(l — t) + tci, for t G [0, 1] 
and i = 1, . . . ,n. We can perturb the numbers bi, . . . ,bn and Ci, . . . , c„ so that there is never 
at G [0, 1] for which three entries of D{-) are equal to each other (this can be guaranteed 
by adding random infinitesimal noise to each number, and observing that the probability of 
this bad event has measure zero). Note that -D(O) = B and -D(l) = C. 

Furthermore, since for the inputs B and C our algorithm had reached the same node 
(i.e., v) in the decision tree, it holds that for all the comparisons the algorithm performed 
so far, it got exactly the same results for both inputs. 

Now, assume without loss of generality, that the label for ba in B is strictly smaller than 
the label for Cq, in C. Clearly, for some value of t in this range, denoted by t*, da{t) must be 
of rank in the set {mi, . . . , m^}. Indeed, as t increases from to 1, the rank of da{t) starts 
at the rank of fea in 5, and ends up with the rank of Ca in C. But D(t*) agrees with all the 
comparisons seen by the algorithm so far (since if bi < bj and q < Cj then di(t) < dj(t), for 
t G [0, 1]). We conclude that the assignment that realizes D{t*) must leave da(t) unlabeled. 
Namely, the set Z has two labellings with different sets of k elements that are unlabeled, 
and as such the algorithm can not terminate and must perform SOME more comparisons if 
it reached v (i.e., v is not a leaf of the decision tree). 

Thus, the algorithm can terminate only when |Z| = 1. Let j3 = n/k — 1, and observe that 
in the beginning of MedianAlg execution, it has 

^! 

M 



k\{/3\)'' 



possible labellings for the output. Indeed, a consistent labeling, is made out of k unla- 
beled elements, and then (3 elements are labeled by i, for i = 1, . . . ,k. Now, by Stirling's 
approximation, we have 

Each comparison performed can only half this set of possible labellings, in the worst case. 
It follows, that in the worst case, the algorithms needs 

comparisons, as claimed. ■ 

Lemma 2.2 Solving the k-range-medians problem requires Q{nlogk) comparisons. 
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Proof: We will show that given an algorithm for the fc-range-medians problem, one can 
reduce it, in linear time, to the problem of Claim [TTl That would immediately imply the 
lower bound. 

Given an input array § of size n, construct a new array T of size 4n where the first n 
elements of T are — oo. Tin + 1, . . . , 2n] = §, and 7[j] = +cxd, for j = 2?7, + 1, . . . , An. Clearly, 
the ith element of S is the median of the range [1, 2n + 2i — 1] in T. Thus, we can solve the 
problem of Claim 12.11 using k median range queries, implying the lower bound. ■ 

Observe that the lower bound holds even for the case when the intervals are hierarchical. 

3 Our Algorithm 

We first consider the case when all the query intervals are provided ahead of time. We will 
present a slow algorithm first, and later show how to make it faster to get our bounds. Our 
algorithm uses the following folklore result. 

Theorem 3.1 Given i sorted arrays with total size n, there is a deterministic algorithm 
to determine median of the set formed by the union of these arrays using 0{ilog{n/i)) 
comparisons. 

Since we were unable to find a reference to precisely this result beyond JKMS05] where a 
slightly weaker result is stated as a folklore claim, we describe this algorithm in Appendix El 

3.1 A Slow Algorithm 

Here we show how to solve the /c-range-medians problem. 

Let Ji, . . . , Jfe be the given (not necessarily disjoint) k intervals in the array §[l..n]. We 
break S into (at most) 2k — 1 atomic disjoint intervals labeled in the sorted order Bi, . . . , B^, 
such that an atomic interval does not have an endpoint of any Jj inside it. Next, we sort 
each one of the -Bj's, and build a balanced binary tree having Bi, . . . , B^ as the leaves in 
this order. In a bottom-up fashion we merge the sorted arrays sorted in the leaves, so that 
each node v stores a sorted array S^ of all the elements stored in its subtree. Let T denote 
this tree that has height 0{\ogk). 

Now, computing the median of an interval Ij, is done by extracting the 0(log A;) suitable 
nodes in T that cover Ij. Next, we apply Theorem 13. 11 and using O (log n log k) comparisons, 
we get the desired median. We now apply this to the k given intervals. Observe that sorting 
the atomic intervals takes O(nlogn) comparisons and merging them in 0{logk) levels takes 
0{nlogk) comparisons in all. This gives: 

Lemma 3.2 The algorithm above uses 0{n\ogn + klognlogk) comparisons. 

Note, that this algorithm is still mildly interesting. Indeed, if the intervals Ii, . . . ,Ik are 
all "large", then the running time of the naive algorithm is 0{nk), and the above algorithm 
is faster for k > logn. 



3.2 Our Main Algorithm 

The main bottleneck in the above solution was the presorting of the pieces of the array 
corresponding to atomic intervals. In the optimal algorithm below, we do not fully sort 
them. 

Definition 3.3 A subarray X is u-sorted if there is a sorted list Cx of at most (say) 20n 
elements of X such that these elements appear in this sorted order in X (not necessarily as 
consecutive elements). Furthermore, for an element a of Cx, all the elements of X smaller 
than it appear before it in X and all the elements larger than a appear after a in X. Finally, 
we require that the distance between two consecutive elements of Cx in X is at most |X|/m, 
where \X\ denotes the size of X. We will refer to the elements of X between two consecutive 
elements of Cx as a segment. 

An array X oin elements that is n-sorted is just sorted, and a 0-sorted array is unsorted. 
Another way to look at it, is that the elements of Cx are in their final position in the sorted 
order, and the elements of the intervals are in an arbitrary ordering. 

Lemma 3.4 Given an unsorted array X , it can be u-sorted using 0{\X\ logu) comparisons, 
where \X\ denotes the number of elements of X . 

Proof: We just find the median of X, partition X into two equal size subarrays, and 
continue recursively on the two subarrays. The depth the recursion is O(logM), and the 
work at each level of the recursion is linear, which implies the claim. ■ 

Lemma 3.5 Given a two u-sorted arrays X and Y , they can be merged into an u-sorted 
array using 0{\X\ + \Y\) comparisons. 

Proof: Convert Y into a linked list. Insert the elements of Cx into Y . This can be done by 
scanning the list of Y until we arrive at the segment Yi of Y that should contain an element 
h of Cx that we need to insert. We partition this segment using h into two intervals, add h to 
£y, and continue in this fashion with each such h. This takes 0(|yj|) = 0{\Y\/u) comparisons 
per h (ignoring the scanning cost which is 0(|y|) overall). Let Z be the resulting w-sorted 
array, which contains all the elements of Y and all the elements of Cx-, and Cz = Cx U Cy. 
Computing Z takes 

o(^|r| + |/:x|^)=o(|F|) 

comparisons. 

We now need to insert the elements of X \ Cx into Z. Clearly, if a segment Xi of X has 
ai elements of Cz in its range, then inserting the elements of Xi would take 0(|Xj| loga^) 
comparisons. Thus, the total number of comparisons is 

0\y^\X,\\ogaA =0\Y,^-^\ogaA =o|^^a,j = 0(X) , 
since |Xj| < |Ar|/n, loga^ < a^ and Xli*^* ~ 0{u). 



The final step is to scan over Z, and merge consecutive intervals that are too small 
(removing the corresponding elements from Cz) ■, such that each interval is of length at most 
\Z\/u. Clearly, this can be done in linear time. The resulting Z is w-sorted since its sorted 
list contains at most 2n + 1 elements, and every interval is of length at most \Z\/u. ■ 

Note, that the final filtering stage in the above algorithm is need to guarantee that the 
resulting list Cz size is not too large, if we were to use this merging step several times. 

In the following, we need a modified version of Theorem 13.11 that works for w-sorted 
arrays. 

Theorem 3.6 Given I u-sorted arrays Ai, . . . ,Ae with total size n and a rank k, there is a 
deterministic algorithm that returns i subintervals Bi, . . . ,B£ of these arrays and a number 
k' , such that the following properties hold. 

(i) The k'th ranked element of BiU ■ ■ ■ U Bg is the kth ranked element of AiU ■ ■ -U Ag. 

(a) The running time is 0{£log{n/£)) time. 

(m)ELm=0{i-{n/u)). 

Proof: For every element of C^. realizing the w-sorting of the array Ai, we assume we 
have its rank in Ai precomputed. Now, we execute the algorithm of Theorem 13.11 on these 
(representative) sorted arrays (taking into account their associated rank). (Note that the 
required modifications of the algorithm of Theorem 13.11 are tedious but straightforward, and 
we omit the details.) The main problem is that now the rank of an element is only estimated 
approximately up to an (additive) error of n/u. In the end of process of trimming down the 
representative arrays, we might still have active intervals of total length 2n/u in each one of 
these arrays, resulting in the bound on the size of the computed intervals. ■ 

Using the theorem above as well as two lemmas above, we get the following result, which 
is building up to the algorithmic part of Theorem II. 1[ 

Lemma 3.7 There is a deterministic algorithm to solve the k -range-medians problem in 
0{nlogk + klogklogn) time, when the k query intervals are provided in advance. 

Proof: We repeat the algorithm of Section 13.11 using w-sorting instead of sorting, for u 
to be specified shortly. Building the data-structure (i.e., the tree over the atomic intervals) 
takes 0{n logw) comparisons. Indeed, we first w-sort the atomic intervals, and then we merge 
them as we go up the tree. 

A query of finding the median of array elements in an interval is now equivalent to 
finding the median for m = 0{logk) u-sorted arrays Ai, . . . ,Am- Using the algorithm of 
Theorem 13.61 results in m intervals Bi, . . . , B^ that belong to Ai, . . . , A^, respectively, such 
that we need to find the k'th smallest element in Bi U . . . U Bm- The total length of the SjS 
is 0{mn/u). Now we can just use the brute force method. Merge Bi, . . . , Bm into a single 
array and find the k'th smallest element using the classical algorithm. This take 0{mn/u) 
comparisons. We have to repeat this k times, and the number of comparisons we need is 



Oikm — h /cm log n] = 0{n + klogklogn) 



for u = k"^, since m = 0{\ogk). Thus, in all, the number of comparisons using by the 
algorithm is O {n log k + k log k log n) . ■ 

We can extend this bound to the case when the intervals are presented in an online 
manner, and we get amortized bounds. 

Lemma 3.8 (When k is known in advance.) There is a deterministic algorithm to solve 
the k-range-medians problem in 0{n\ogk + A; log A; log ra) time, when the k query intervals 
are provided in an online fashion, hut k is known in advance. 

Proof: The idea is to partition the array into u, u < k"^ atomic intervals all of the same 
length, and build the data-structure of these atomic intervals. The above algorithm would 
work verbatim, except for every query interval /, there would be two "dangling" atomic 
intervals that are of size n/u that contain the two endpoints of I. 

Specifically, to perform the query for J, we compute m = 0{\ogk) w-sorted arrays using 
our data-structure. We also take these two atomic intervals, clip them into the query interval, 
ii-sort them, and add them to the m u-sorted arrays we already have. Now, we need to 
perform the median query over these 0(logA;) u-sorted arrays, which we can do, as described 
above. Clearly, the resulting algorithm has running time 

Oinlogu + k\ogu\ogn + k— logu) = 0{n\ogk + k\ogk\ogn) , 

since u = k'^. ■ 

Lemma 3.9 (When k is not known in advance.) There is a deterministic algorithm to 
solve the k-range-medians problem in 0{n\ogk-\- klogklogn) time, when the k query inter- 
vals are provided in an online fashion. 

Proof: We will use the algorithm of Lemma 13.81 

At each stage, we have a current guess to the number of queries to be performed. In the 
beginning this guess is a constant, say 10. When this number of queries is exceeded, we square 
our guess, rebuild our data-structure from scratch for this new guess, and continue. Let 
ki = 10 and ki = {ki-iY be the sequence of guesses, for z = 1, . . . , /?, where /3 = 0(loglog k). 
We have that the total running time of the algorithm is 



y^O{n\ogki + kilogki \ogn) = 0{n\ogk + klogklogn), 

since logfcj-i = (logA;j)/2, for all i. m 

Lemma 13.91 implies the algorithmic part of Theorem 11.11 

4 Concluding Remarks 

The /c-range-medians problem is a natural interval generalization of the classical median 
finding problem: unlike interval generalizations of other problems such as max, min or sum 
which can be solved in linear time, our problem (surprisingly) needs Q{nlogk) comparisons, 



and we present an algorithm that solves this problem with running time (and number of 
comparisons) O (n log k + k log k log n) . A number of technical problems remain and we list 
them below. 

• Currently, our algorithm uses 0{n\ogk) space. It would be interesting to reduce this 
to linear space. 

• Say the elements are from an integer range 1, . . . , f/. Can we design o{n) time algo- 
rithms in that case using word operations? For the classical median finding problem, 
both comparison-based and word-based algorithms take 0{n) time. But given that 
the comparison-based algorithm needs VL{n\ogk) comparisons for our fc-range-medians 
problem, it now becomes interesting if word-based algorithms can do better for integer 
alphabet. 

• Say one wants to only answer median queries approximately for each interval (see 
|BKMT05] for some relevant results). Can one design o(?7,logA;) algorithms? 

Suppose the elements are integers in the range 1, . . . , f/. We define an approximate 
version where the goal is to return an element within (lie) of the correct median in 
value, for some fixed e, < £ < 1. Then we can keep an exponential histogram with 
each atomic interval of the number of elements in the range [(1 -l-e)*, (1 +£:)*"*"^) for each 
z, and follow the algorithm outline here constructing them for all the suitably chosen 
intervals on the balanced binary tree atop these atomic intervals. For each interval 
in the query, one can easily merge the exponential histograms corresponding to and 
obtain an algorithm that takes time 0{n + klogklogU), since any two exponential 
histograms can be merged in 0(log U) time. If the elements are not integers in the range 
1, . . . ,U and one worked in the comparison model, similar results may be obtained 
using |GK01t IGK04] . or e-nets. It is not clear if these bounds are optimal. 

• We believe extending the problem to two (or more) dimensions is also of interest. There 
is prior work for range sum and minimums, but tight bounds for k range medians will 
be interesting. 
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A Choosing median from sorted arrays 

In this section, we prove Theorem 13. II by providing a fast deterministic algorithm for choosing 
the median element of £ sorted arrays. As we mentioned before, this result seems to be known, 
but we are unaware of a direct reference to it, and as such we provide a detailed algorithm. 
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A.l The algorithm 

Let Ai, . . . jAihe the given sorted arrays of total size n. We maintain i active ranges [k, Vi] 
of the array Ai where the required element (i.e., "median") lies, for i = 1, . . . ,i. Let k denote 
the rank of the required median. Let ricurr = "^{{^i ~h + ^) be the total number of currently 
active elements. 

If UcuTT < 32£, then we find the median in linear time, using the standard deterministic 
algorithm. Otherwise, let A = [?7,curr/(32£)J. Pick Uj — 1 equally spaced elements from the 
active range of Ai, where 

^Ti-li + l'^ 



Ui 



4 + 



A 



Let Li be the resulting list of representatives, for i = 
range of Ai into blocks of size 

Ti- li + 1 



Note that L,- breaks the active 



Ui < 



Ui 



For each element of Li we know exactly how many elements are smaller than it and larger 
than it in the ith array. Merge the lists Li, . . . ,Li into one sorted list L. For an element x, 
let rank(x) denote the rank of x in the set Ai U . . . U A^. Note, that now for every element x 
of L we can estimate its rank(a;) to lie within an interval of length T = X]j=i ^i- Indeed, we 
know for an element of a; G L between what two consecutive representatives it lies for all i 
arrays. For element x G L, let R{x) denote this range where the rank of x might lie. 

Now, given two consecutive representatives x and y in the ith array, ii k ^ R{x) and 
k ^ R{y) then the required median cannot lie between x and y, and we can shrink the active 
range not to include this portion. In particular, the new active range spans all the blocks 
which might contain the median. The algorithm now updates the value of k and continues 
recursively on the new active ranges. 

A. 2 Analysis 

The error estimate for the rank of a representative is bounded by 
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since d. < ?7,curr/32 and by the choice of A. 

Consider the sorted merged array B of all the active elements. The length of B is ricurr, 
and assume, for the sake of simplicity of exposition, that the desired median is in the second 
half of B (the other case follows by a symmetric argument). Note, that any representative x 
that fall in the first quarter of B has a rank that lies in a range shorter than T < ncurr/4, and 
as such it cannot include k. In particular, let ti be the index in Ai of the first representative 
in the active range (of Ai) that does not falls in the first quarter of B. Observe that 
'Yliii^i ~ ^i + 1) ^ ''^curr/4. The total number of elements that are being eliminated by the 
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algorithm (in the top of the recursion) is at least 



curr 



i i i 

Namely, each recursive call continues on total length of all active ranges smaller by a factor 
of (7/8) from the original array. 

The total length of Li, ... L^ is 0(£), and as such the total work (ignoring the recursive 
call) is bounded by 0(£log£). The running time is bounded by 

T(neurr) = 0{nog(i) + T((7/8)n,,„) , 

where T(£) = 0(£log£). Thus, the total running time is 0(£log£log(?7,curr/^))- 

A. 3 Doing even better - a faster algorithm 

Observe, that the bottleneck in the above algorithm is the merger of the representative lists 
Li, . . . , Li. Instead of merging them, we will compute the median a; of L = Li U . . . U L^. 
If R{x) does not contain /c, then we can throw away at least ncurr/4 elements in the current 
active ranges and continue recursively. Otherwise, compute the element z of rank ncurr/4 in 
L. Clearly, k ^ R{z) and one can throw, as above, as constant fraction of the active ranges. 
The resulting running time (ignoring the recursive call) is 0{€) (instead of 0{l\ogt)). Thus, 
the running time of the resulting algorithm is 0(£log(?7,curr/^))- 
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